For this assignment we will use the same dataset as in the previous tasks, i.e. the data about WAIS subtests results. Lets recall the correlation matrix first.

We can see a moderate correlation among all of the variables, however the picture recognition score seem to correlate less with the other variables.

Let’s perform the principle components analysis for all of the variables now. Since we are not sure whether all the tests have the same score range, we will use the scaled version.

PCA = prcomp(WAIS[2:ncol(WAIS)], scale. = T)
PCA
## Standard deviations (1, .., p=4):
## [1] 1.6770917 0.7943561 0.6145604 0.4227024
## 
## Rotation (n x k) = (4 x 4):
##                    PC1        PC2         PC3         PC4
## Information  0.5490858 -0.1243367 -0.31862732 -0.76257579
## Similarities 0.5269680 -0.3145079 -0.47627735  0.62972166
## Arithmetic   0.4976878 -0.2807389  0.81830408  0.06221726
## Pictures     0.4160726  0.8982265  0.04488815  0.13437936

Let us look at the explained variability.

The scree plot above shows that the first principle component explains about 70 % of the variability and together with the second component they are able to explain roughly 85 % of the variability. Therefore it might be enough to use only the first two components instead of the four original variables.

What are the contributions of single variables to the first two components?

We can see that the all of the variables contribute roughly the same amount to the first principal component, therefore the component somehow describes the overall performance in the tests. On the other hand the second component is mostly formed by the picture completion score.

Finally, let us take a look on the biplot of the first two components and see whether we are able to distinguish the to groups.

fviz_pca_biplot(PCA, habillage = WAIS$Senile, addEllipses = T, ellipse.level = 0.7) +
  xlim(-5,4) + ylim(-5,4)

We can see that the senile group tends to have higher values in the first component, therefore it looks like they outperform the second group overall. Also the second component helps to distinguish the two groups, since the senile group tends to have lower values of the component, which further means that they are weaker in the picture completion part in particular. This suggests that if we would like to predict whether a person has a senile factor or not based on the tests results, it might be useful to consider just the overall score of the tests with a higher weight on the picture completion part.